Template Sampling for Leveraging Domain Knowledge in Information Extraction
نویسندگان
چکیده
We initially describe a feature-rich discriminative Conditional Random Field (CRF) model for Information Extraction in the workshop announcements domain, which offers good baseline performance in the PASCAL shared task. We then propose a method for leveraging domain knowledge in Information Extraction tasks, scoring candidate document labellings as one-value-per-field templates according to domain feasibility after generating sample labellings from a trained sequence classifier. Our relational models evaluate these templates according to our intuitions about agreement in the domain: workshop acronyms should resemble their names, workshop dates occur after paper submission dates. These methods see a 5% f-score improvement in fields retrieved when sampling labellings from a Maximum-Entropy Markov Model, however we do not observe improvement over a CRF model. We discuss reasons for this, including the problem of recovering all field instances from a best template, and propose future work in adapting such a model to the CRF, a better standalone system.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملExtracting Template for Knowledge-based Question- Answering Using Conditional Random Fields
In this paper, we present an information extraction system that extracts template elements for a question-answering (QA) system in the domain of encyclopedia. We use Conditional Random Fields to extract templates from the texts of an encyclopedia. Using the proposed approach, we could achieve a 74.89% precision and a 55.77% F1 in the template extraction. In the question classification, we could...
متن کاملAttribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge
A variety of methods have been proposed for attribute-value extraction from semistructured text with consistent templates (strict semi-text). However, when the templates in semi-structured text are inconsistent (weak semi-text), these methods will work poorly. To overcome the templateinconsistent problem, in this paper, we proposed a novel method to leverage sitelevel knowledge for attribute-va...
متن کاملFocusing on Scenario Recognition in Information Extraction
This paper reports a research effort in Information Extraction, especially in template pattern matching. Our approach uses reach domain knowledge in the football (soccer) area and logical form representation for necessary inferences of facts and templates filling. Our system FRET (Football Reports Extraction Templates) is compatible to the language-engineering environment GATE and handles its i...
متن کاملFocusing on Scenario Recognition in Infomation Extraction
This paper reports a research effort in Information Extraction, especially in template pattern matching. Our approach uses reach domain knowledge in the football (soccer) area and logical form representation for necessary inferences of facts and templates filling. Our system FRET' (Football Reports Extraction Templates) is compatible to the language-engineering environment GATE and handles its ...
متن کامل